Computing Information Gain in Data Streams
نویسندگان
چکیده
Computing information gain in general data streams, in which we do not make any assumptions on the underlying distributions or domains, is a hard problem, severely constrained by the limitations on memory space. We present a simple randomized solution to this problem that is time and space efficient as well as tolerates a relative error that has a theoretical upper bound. It is based on a novel method of discretization of continuous domains using quantiles. Our empirical evaluation of the technique, using standard and simulated datasets, convincingly demonstrates its practicality and robustness. Our results include accuracy versus memory usage plots and comparisons with a popular discretization technique.
منابع مشابه
Adaptive Clustering for Monitoring Distributed Data Streams (SDM EDA 2014)
Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. The proposed approach enables mon...
متن کاملAdaptive Clustering for Monitoring Distributed Data Streams
Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. The proposed approach enables mon...
متن کاملارائه چارچوبی برای سیستم مدیریت دانش در محیط رایانش ابری و وب 2.0
Today, data, information and knowledge are very important assets for the Organizations and the effective management of knowledge is considered a way to gain and sustain a competitive advantage in a highly dynamic environment of the organizations. With the growth of information and communication technologies, cloud computing and Web 2.0, as new Phenomena, recommend helpful solutions in the field...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملA Literature Review on Cloud Computing Security Issues
The use of Cloud Computing has increasedrapidly in many organization .Cloud Computing provides many benefits in terms of low cost and accessibility of data. In addition Cloud Computing was predicted to transform the computing world from using local applications and storage into centralized services provided by organization.[10] Ensuring the security of Cloud Computing is major factor in the Clo...
متن کامل